Zosterops RADseq filtering

1 Load R packages

library(vcfR) #v1.14.0
library(ggplot2) #v3.4.1
library(adegenet) #v2.1.10
library(SNPfiltR) #v1.0.1
library(StAMPP) #v1.6.3

We will follow the SNP filtering protocol outlined by the SNPfiltR R package in (DeRaad 2022).

2 Read in data

We will read in the unfiltered SNPs for all samples output as a vcf file after using BWA (Li and Durbin 2009) to map our raw RADseq data to the Zosterops japonicus reference genome (available at: https://www.ncbi.nlm.nih.gov/assembly/GCA_017612475.1) (Venkatraman, Fleischer, and Tsuchiya 2021).

#read in vcf
vcfR <- read.vcfR("~/Desktop/cali.zosterops.rad/zost.unfiltered.snps.vcf.gz")
#check the metadata for the raw, unfiltered SNP dataset
vcfR
***** Object of Class vcfR *****
155 samples
3807 CHROMs
236,767 variants
Object size: 1155 Mb
65.49 percent missing data
*****        *****         *****
#read in sample info csv
sample.info<-read.csv("~/Desktop/cali.zosterops.rad/zosterops.RAD.sampling.csv")

#make sure sampling file matches the samples in your vcf
sample.info$ID == colnames(vcfR@gt)[-1]
Warning in sample.info$ID == colnames(vcfR@gt)[-1]: longer object length is not
a multiple of shorter object length
  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [25]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [37]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [61]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [73]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[157] FALSE FALSE FALSE FALSE FALSE

3 Implement quality filters that don’t involve missing data

This is because removing low data samples will alter percentage/quantile based missing data cutoffs, so we wait to implement those until after deciding on our final set of samples for downstream analysis

#hard filter to minimum depth of 5, and minimum genotype quality of 30
vcfR<-hard_filter(vcfR=vcfR, depth = 3, gq = 30)
8% of genotypes fall below a read depth of 3 and were converted to NA
1.37% of genotypes fall below a genotype quality of 30 and were converted to NA

Use the function allele_balance() to filter for allele balance from Puritz SNP filtering tutorial “Allele balance: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous, we expect that the allele balance in our data (for real loci) should be close to 0.5”

#execute allele balance filter
vcfR<-filter_allele_balance(vcfR, min.ratio = .10, max.ratio = .90)
0.35% of het genotypes (0.01% of all genotypes) fall outside of 0.1 - 0.9 allele balance ratio and were converted to NA

We now want to implement a max depth filter (super high depth loci are likely multiple loci stuck together into a single paralogous locus, which we want to remove).

#visualize and pick appropriate max depth cutoff
max_depth(vcfR)
cutoff is not specified, exploratory visualization will be generated.

dashed line indicates a mean depth across all SNPs of 94.1

#filter vcf by the max depth cutoff you chose
vcfR<-max_depth(vcfR, maxdepth = 250)
maxdepth cutoff is specified, filtered vcfR object will be returned
14.57% of SNPs were above a mean depth of 250 and were removed from the vcf

Remove SNPs from the vcfR that have become invariant following the removal of questionable genotypes above, and see how many SNPs we have left after this initial set of filters

vcfR<-min_mac(vcfR, min.mac = 1)
11.72% of SNPs fell below a minor allele count of 1 and were removed from the VCF

vcfR
***** Object of Class vcfR *****
155 samples
3415 CHROMs
178,578 variants
Object size: 713.7 Mb
74.07 percent missing data
*****        *****         *****

4 Setting the missing data by sample threshold

Check out the exploratory visualizations and make decisions about which samples to keep for downstream analysis.

#run function to visualize per sample missingness
miss<-missing_by_sample(vcfR)
No popmap provided

#run function to visualize per SNP missingness
by.snp<-missing_by_snp(vcfR)
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0655
Warning: Removed 310 rows containing non-finite values
(`stat_density_ridges()`).

Warning: Removed 2 rows containing missing values (`geom_point()`).

Based on these visualizations, we will want to drop the worst sequenced samples that are dragging down the rest of the dataset. Drop those samples based on an approximate missing data proportion cutoff here (this can always be revised if we end up feeling like this is too lenient or stringent later):

#run function to drop samples above the threshold we want from the vcf
vcfR.trim<-missing_by_sample(vcfR=vcfR, cutoff = .9)
25 samples are above a 0.9 missing data cutoff, and were removed from VCF

#remove invariant sites generated by sample trimming
vcfR.trim<-min_mac(vcfR.trim, min.mac = 1)
1.58% of SNPs fell below a minor allele count of 1 and were removed from the VCF

5 Setting the missing data by SNP threshold

Now we will visualize different per SNP missing data thresholds and identify a value that optimizes the trade-off between amount of missing data and the total number of SNPs retained.

#see what effect trimming samples had on missing data across the dataset
by.snp<-missing_by_snp(vcfR.trim)
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0659

#we want to identify the three samples that don't seem to improve much, even as we crank up the missingness per SNP threshold (i.e., the ones lagging out to the right in this plot)
#to do so, we will get an updated list of missingness per sample across a variety of thresholds saved as a dataframe named 'miss'
miss<-missing_by_sample(vcfR.trim)
No popmap provided

#to identify those lagging samples, we will isolate the part of the dataframe corresponing to a 90% completeness per SNP filter, where a handful of samples are clearly lagging
m<-miss[miss$filt == 0.9,]
#print the 10 samples with the lowest number of SNPs retained at a 90% per SNP completeness threshold
m[order(m$snps.retained)[1:10],]
           indiv filt snps.retained
628 ZpalDOT-5746  0.9          6076
611    Zmon27152  0.9          7376
564      ZJsi003  0.9          7478
521      ZJlo010  0.9          8083
599     ZERxx003  0.9          8122
542      ZJja011  0.9          8200
584     ZMOha001  0.9          8355
594     ZMOmo001  0.9          8564
634    Zsim13809  0.9          8641
572      ZJsi031  0.9          8738
#we can now easily remove remaining problematic samples directly with this simple code:
vcfR.trim<-vcfR.trim[,colnames(vcfR.trim@gt) != "ZpalDOT-5746" & colnames(vcfR.trim@gt) != "Zmon27152" & colnames(vcfR.trim@gt) != "ZJsi003" & colnames(vcfR.trim@gt) != "ZJlo010" & colnames(vcfR.trim@gt) != "ZERxx003" & colnames(vcfR.trim@gt) != "ZJja011"]
#drop any SNPs that became invariant because of sample removal
vcfR.trim<-min_mac(vcfR.trim, min.mac = 1)
1.53% of SNPs fell below a minor allele count of 1 and were removed from the VCF

#see what effect trimming samples had on missing data across the dataset
by.snp<-missing_by_snp(vcfR.trim)
cutoff is not specified, exploratory visualizations will be generated
Picking joint bandwidth of 0.0627

Visualize sample relatedness at 99% per-SNP completeness cutoff to see our null expectations if there is no missing data effect

#set 99% completeness per SNP threshold
test<-missing_by_snp(vcfR.trim, cutoff=.99)
cutoff is specified, filtered vcfR object will be returned
99.54% of SNPs fell below a completeness cutoff of 0.99 and were removed from the VCF

missing_by_sample(test) #no samples above 8% missing data
No popmap provided

            indiv filt snps.retained
1         ZJlo012  0.5           789
2         ZJlo015  0.5           789
3         ZJlo016  0.5           789
4         ZJlo017  0.5           789
5         ZJlo021  0.5           789
6         ZJlo022  0.5           789
7         ZJlo024  0.5           789
8         ZJlo031  0.5           789
9         ZJlo045  0.5           782
10        ZJlo046  0.5           789
11        ZJlo050  0.5           788
12        ZJlo052  0.5           789
13        ZJlo055  0.5           789
14        ZJja001  0.5           771
15        ZJja002  0.5           789
16        ZJja003  0.5           789
17        ZJja004  0.5           760
18        ZJja005  0.5           789
19        ZJja009  0.5           789
20        ZJja010  0.5           778
21        ZJja012  0.5           789
22        ZJja013  0.5           789
23        ZJja014  0.5           789
24        ZJja016  0.5           789
25        ZJja030  0.5           789
26        ZJal003  0.5           789
27        Zjal004  0.5           789
28        ZJal005  0.5           789
29        ZJal010  0.5           789
30        ZJal011  0.5           789
31        ZJal012  0.5           789
32        ZJal020  0.5           789
33        ZJst007  0.5           789
34        ZJst009  0.5           789
35        ZJst010  0.5           789
36        ZJst012  0.5           789
37        ZJst014  0.5           789
38        ZJst015  0.5           789
39        ZJin001  0.5           789
40        ZJin003  0.5           789
41        ZJsi002  0.5           789
42        ZJsi017  0.5           781
43        ZJsi032  0.5           747
44        ZJsi023  0.5           789
45        ZJsi024  0.5           768
46        ZJsi027  0.5           758
47        ZJsi028  0.5           787
48        ZJsi029  0.5           789
49        ZJsi031  0.5           737
50        ZPxx001  0.5           789
51        ZPxx002  0.5           789
52        ZMxx002  0.5           789
53        ZMxx003  0.5           789
54        ZMxx004  0.5           789
55        ZMxx005  0.5           766
56        ZMxx006  0.5           789
57       ZMOxx001  0.5           789
58       ZMOxx002  0.5           789
59       ZMOxx003  0.5           782
60       ZMOxx005  0.5           789
61       ZMOha001  0.5           742
62       ZMOwh002  0.5           770
63       ZMOwh003  0.5           789
64       ZMOwh004  0.5           789
65       ZMOwh005  0.5           770
66       ZMOwh006  0.5           789
67       ZMOvu002  0.5           789
68       ZMOvu003  0.5           789
69       ZMOvu004  0.5           787
70       ZMOvu005  0.5           788
71       ZMOmo001  0.5           753
72       ZMOpa001  0.5           788
73       ZMOpa002  0.5           789
74        ZNni001  0.5           783
75       ZERxx002  0.5           789
76       ZERxx004  0.5           789
77       ZERxx005  0.5           789
78      Zsim23588  0.5           789
79      Zsim28142  0.5           789
80      Zsim30897  0.5           787
81      Zpal23498  0.5           782
82      Zpal23522  0.5           789
83      Zmon20893  0.5           789
84      Zmon20892  0.5           781
85      Zmon20902  0.5           789
86      Zmon20909  0.5           789
87      Zsim31166  0.5           784
88      Zsim31159  0.5           789
89      Zmey17876  0.5           789
90      Zmey17877  0.5           789
91      Zmey17852  0.5           788
92      Zmey17922  0.5           789
93      Zmey17920  0.5           789
94      Zmey17925  0.5           789
95      Zmey17923  0.5           789
96      Zery28090  0.5           789
97      Zery28091  0.5           781
98      Zery28088  0.5           731
99       Zni10863  0.5           784
100      Zni14341  0.5           789
101      Zni19650  0.5           788
102      Zni17984  0.5           784
103  ZjaDOT-10981  0.5           789
104   ZjaDOT-5235  0.5           789
105 Z_LA_122866_2  0.5           789
106 Z_LA_122577_2  0.5           789
107 Z_LA_122188_2  0.5           789
108     Zsim13809  0.5           758
109     Zsim13773  0.5           789
110      Zsim6797  0.5           789
111     Zsim10336  0.5           781
112     Zsim11362  0.5           789
113     Zsim11102  0.5           789
114     Zsim11220  0.5           789
115     Zmon20899  0.5           784
116     Zmon28375  0.5           780
117      Zev13949  0.5           789
118      Zev31650  0.5           781
119      Zev28451  0.5           789
120   Z_HI_BRY431  0.5           787
121   Z_HI_NAN290  0.5           789
122   Z_HI_WAI087  0.5           757
123   Z_HI_WAI078  0.5           781
124   Z_HI_SOL783  0.5           789
125       ZJlo012  0.6           789
126       ZJlo015  0.6           789
127       ZJlo016  0.6           789
128       ZJlo017  0.6           789
129       ZJlo021  0.6           789
130       ZJlo022  0.6           789
131       ZJlo024  0.6           789
132       ZJlo031  0.6           789
133       ZJlo045  0.6           782
134       ZJlo046  0.6           789
135       ZJlo050  0.6           788
136       ZJlo052  0.6           789
137       ZJlo055  0.6           789
138       ZJja001  0.6           771
139       ZJja002  0.6           789
140       ZJja003  0.6           789
141       ZJja004  0.6           760
142       ZJja005  0.6           789
143       ZJja009  0.6           789
144       ZJja010  0.6           778
145       ZJja012  0.6           789
146       ZJja013  0.6           789
147       ZJja014  0.6           789
148       ZJja016  0.6           789
149       ZJja030  0.6           789
150       ZJal003  0.6           789
151       Zjal004  0.6           789
152       ZJal005  0.6           789
153       ZJal010  0.6           789
154       ZJal011  0.6           789
155       ZJal012  0.6           789
156       ZJal020  0.6           789
157       ZJst007  0.6           789
158       ZJst009  0.6           789
159       ZJst010  0.6           789
160       ZJst012  0.6           789
161       ZJst014  0.6           789
162       ZJst015  0.6           789
163       ZJin001  0.6           789
164       ZJin003  0.6           789
165       ZJsi002  0.6           789
166       ZJsi017  0.6           781
167       ZJsi032  0.6           747
168       ZJsi023  0.6           789
169       ZJsi024  0.6           768
170       ZJsi027  0.6           758
171       ZJsi028  0.6           787
172       ZJsi029  0.6           789
173       ZJsi031  0.6           737
174       ZPxx001  0.6           789
175       ZPxx002  0.6           789
176       ZMxx002  0.6           789
177       ZMxx003  0.6           789
178       ZMxx004  0.6           789
179       ZMxx005  0.6           766
180       ZMxx006  0.6           789
181      ZMOxx001  0.6           789
182      ZMOxx002  0.6           789
183      ZMOxx003  0.6           782
184      ZMOxx005  0.6           789
185      ZMOha001  0.6           742
186      ZMOwh002  0.6           770
187      ZMOwh003  0.6           789
188      ZMOwh004  0.6           789
189      ZMOwh005  0.6           770
190      ZMOwh006  0.6           789
191      ZMOvu002  0.6           789
192      ZMOvu003  0.6           789
193      ZMOvu004  0.6           787
194      ZMOvu005  0.6           788
195      ZMOmo001  0.6           753
196      ZMOpa001  0.6           788
197      ZMOpa002  0.6           789
198       ZNni001  0.6           783
199      ZERxx002  0.6           789
200      ZERxx004  0.6           789
201      ZERxx005  0.6           789
202     Zsim23588  0.6           789
203     Zsim28142  0.6           789
204     Zsim30897  0.6           787
205     Zpal23498  0.6           782
206     Zpal23522  0.6           789
207     Zmon20893  0.6           789
208     Zmon20892  0.6           781
209     Zmon20902  0.6           789
210     Zmon20909  0.6           789
211     Zsim31166  0.6           784
212     Zsim31159  0.6           789
213     Zmey17876  0.6           789
214     Zmey17877  0.6           789
215     Zmey17852  0.6           788
216     Zmey17922  0.6           789
217     Zmey17920  0.6           789
218     Zmey17925  0.6           789
219     Zmey17923  0.6           789
220     Zery28090  0.6           789
221     Zery28091  0.6           781
222     Zery28088  0.6           731
223      Zni10863  0.6           784
224      Zni14341  0.6           789
225      Zni19650  0.6           788
226      Zni17984  0.6           784
227  ZjaDOT-10981  0.6           789
228   ZjaDOT-5235  0.6           789
229 Z_LA_122866_2  0.6           789
230 Z_LA_122577_2  0.6           789
231 Z_LA_122188_2  0.6           789
232     Zsim13809  0.6           758
233     Zsim13773  0.6           789
234      Zsim6797  0.6           789
235     Zsim10336  0.6           781
236     Zsim11362  0.6           789
237     Zsim11102  0.6           789
238     Zsim11220  0.6           789
239     Zmon20899  0.6           784
240     Zmon28375  0.6           780
241      Zev13949  0.6           789
242      Zev31650  0.6           781
243      Zev28451  0.6           789
244   Z_HI_BRY431  0.6           787
245   Z_HI_NAN290  0.6           789
246   Z_HI_WAI087  0.6           757
247   Z_HI_WAI078  0.6           781
248   Z_HI_SOL783  0.6           789
249       ZJlo012  0.7           789
250       ZJlo015  0.7           789
251       ZJlo016  0.7           789
252       ZJlo017  0.7           789
253       ZJlo021  0.7           789
254       ZJlo022  0.7           789
255       ZJlo024  0.7           789
256       ZJlo031  0.7           789
257       ZJlo045  0.7           782
258       ZJlo046  0.7           789
259       ZJlo050  0.7           788
260       ZJlo052  0.7           789
261       ZJlo055  0.7           789
262       ZJja001  0.7           771
263       ZJja002  0.7           789
264       ZJja003  0.7           789
265       ZJja004  0.7           760
266       ZJja005  0.7           789
267       ZJja009  0.7           789
268       ZJja010  0.7           778
269       ZJja012  0.7           789
270       ZJja013  0.7           789
271       ZJja014  0.7           789
272       ZJja016  0.7           789
273       ZJja030  0.7           789
274       ZJal003  0.7           789
275       Zjal004  0.7           789
276       ZJal005  0.7           789
277       ZJal010  0.7           789
278       ZJal011  0.7           789
279       ZJal012  0.7           789
280       ZJal020  0.7           789
281       ZJst007  0.7           789
282       ZJst009  0.7           789
283       ZJst010  0.7           789
284       ZJst012  0.7           789
285       ZJst014  0.7           789
286       ZJst015  0.7           789
287       ZJin001  0.7           789
288       ZJin003  0.7           789
289       ZJsi002  0.7           789
290       ZJsi017  0.7           781
291       ZJsi032  0.7           747
292       ZJsi023  0.7           789
293       ZJsi024  0.7           768
294       ZJsi027  0.7           758
295       ZJsi028  0.7           787
296       ZJsi029  0.7           789
297       ZJsi031  0.7           737
298       ZPxx001  0.7           789
299       ZPxx002  0.7           789
300       ZMxx002  0.7           789
301       ZMxx003  0.7           789
302       ZMxx004  0.7           789
303       ZMxx005  0.7           766
304       ZMxx006  0.7           789
305      ZMOxx001  0.7           789
306      ZMOxx002  0.7           789
307      ZMOxx003  0.7           782
308      ZMOxx005  0.7           789
309      ZMOha001  0.7           742
310      ZMOwh002  0.7           770
311      ZMOwh003  0.7           789
312      ZMOwh004  0.7           789
313      ZMOwh005  0.7           770
314      ZMOwh006  0.7           789
315      ZMOvu002  0.7           789
316      ZMOvu003  0.7           789
317      ZMOvu004  0.7           787
318      ZMOvu005  0.7           788
319      ZMOmo001  0.7           753
320      ZMOpa001  0.7           788
321      ZMOpa002  0.7           789
322       ZNni001  0.7           783
323      ZERxx002  0.7           789
324      ZERxx004  0.7           789
325      ZERxx005  0.7           789
326     Zsim23588  0.7           789
327     Zsim28142  0.7           789
328     Zsim30897  0.7           787
329     Zpal23498  0.7           782
330     Zpal23522  0.7           789
331     Zmon20893  0.7           789
332     Zmon20892  0.7           781
333     Zmon20902  0.7           789
334     Zmon20909  0.7           789
335     Zsim31166  0.7           784
336     Zsim31159  0.7           789
337     Zmey17876  0.7           789
338     Zmey17877  0.7           789
339     Zmey17852  0.7           788
340     Zmey17922  0.7           789
341     Zmey17920  0.7           789
342     Zmey17925  0.7           789
343     Zmey17923  0.7           789
344     Zery28090  0.7           789
345     Zery28091  0.7           781
346     Zery28088  0.7           731
347      Zni10863  0.7           784
348      Zni14341  0.7           789
349      Zni19650  0.7           788
350      Zni17984  0.7           784
351  ZjaDOT-10981  0.7           789
352   ZjaDOT-5235  0.7           789
353 Z_LA_122866_2  0.7           789
354 Z_LA_122577_2  0.7           789
355 Z_LA_122188_2  0.7           789
356     Zsim13809  0.7           758
357     Zsim13773  0.7           789
358      Zsim6797  0.7           789
359     Zsim10336  0.7           781
360     Zsim11362  0.7           789
361     Zsim11102  0.7           789
362     Zsim11220  0.7           789
363     Zmon20899  0.7           784
364     Zmon28375  0.7           780
365      Zev13949  0.7           789
366      Zev31650  0.7           781
367      Zev28451  0.7           789
368   Z_HI_BRY431  0.7           787
369   Z_HI_NAN290  0.7           789
370   Z_HI_WAI087  0.7           757
371   Z_HI_WAI078  0.7           781
372   Z_HI_SOL783  0.7           789
373       ZJlo012  0.8           789
374       ZJlo015  0.8           789
375       ZJlo016  0.8           789
376       ZJlo017  0.8           789
377       ZJlo021  0.8           789
378       ZJlo022  0.8           789
379       ZJlo024  0.8           789
380       ZJlo031  0.8           789
381       ZJlo045  0.8           782
382       ZJlo046  0.8           789
383       ZJlo050  0.8           788
384       ZJlo052  0.8           789
385       ZJlo055  0.8           789
386       ZJja001  0.8           771
387       ZJja002  0.8           789
388       ZJja003  0.8           789
389       ZJja004  0.8           760
390       ZJja005  0.8           789
391       ZJja009  0.8           789
392       ZJja010  0.8           778
393       ZJja012  0.8           789
394       ZJja013  0.8           789
395       ZJja014  0.8           789
396       ZJja016  0.8           789
397       ZJja030  0.8           789
398       ZJal003  0.8           789
399       Zjal004  0.8           789
400       ZJal005  0.8           789
401       ZJal010  0.8           789
402       ZJal011  0.8           789
403       ZJal012  0.8           789
404       ZJal020  0.8           789
405       ZJst007  0.8           789
406       ZJst009  0.8           789
407       ZJst010  0.8           789
408       ZJst012  0.8           789
409       ZJst014  0.8           789
410       ZJst015  0.8           789
411       ZJin001  0.8           789
412       ZJin003  0.8           789
413       ZJsi002  0.8           789
414       ZJsi017  0.8           781
415       ZJsi032  0.8           747
416       ZJsi023  0.8           789
417       ZJsi024  0.8           768
418       ZJsi027  0.8           758
419       ZJsi028  0.8           787
420       ZJsi029  0.8           789
421       ZJsi031  0.8           737
422       ZPxx001  0.8           789
423       ZPxx002  0.8           789
424       ZMxx002  0.8           789
425       ZMxx003  0.8           789
426       ZMxx004  0.8           789
427       ZMxx005  0.8           766
428       ZMxx006  0.8           789
429      ZMOxx001  0.8           789
430      ZMOxx002  0.8           789
431      ZMOxx003  0.8           782
432      ZMOxx005  0.8           789
433      ZMOha001  0.8           742
434      ZMOwh002  0.8           770
435      ZMOwh003  0.8           789
436      ZMOwh004  0.8           789
437      ZMOwh005  0.8           770
438      ZMOwh006  0.8           789
439      ZMOvu002  0.8           789
440      ZMOvu003  0.8           789
441      ZMOvu004  0.8           787
442      ZMOvu005  0.8           788
443      ZMOmo001  0.8           753
444      ZMOpa001  0.8           788
445      ZMOpa002  0.8           789
446       ZNni001  0.8           783
447      ZERxx002  0.8           789
448      ZERxx004  0.8           789
449      ZERxx005  0.8           789
450     Zsim23588  0.8           789
451     Zsim28142  0.8           789
452     Zsim30897  0.8           787
453     Zpal23498  0.8           782
454     Zpal23522  0.8           789
455     Zmon20893  0.8           789
456     Zmon20892  0.8           781
457     Zmon20902  0.8           789
458     Zmon20909  0.8           789
459     Zsim31166  0.8           784
460     Zsim31159  0.8           789
461     Zmey17876  0.8           789
462     Zmey17877  0.8           789
463     Zmey17852  0.8           788
464     Zmey17922  0.8           789
465     Zmey17920  0.8           789
466     Zmey17925  0.8           789
467     Zmey17923  0.8           789
468     Zery28090  0.8           789
469     Zery28091  0.8           781
470     Zery28088  0.8           731
471      Zni10863  0.8           784
472      Zni14341  0.8           789
473      Zni19650  0.8           788
474      Zni17984  0.8           784
475  ZjaDOT-10981  0.8           789
476   ZjaDOT-5235  0.8           789
477 Z_LA_122866_2  0.8           789
478 Z_LA_122577_2  0.8           789
479 Z_LA_122188_2  0.8           789
480     Zsim13809  0.8           758
481     Zsim13773  0.8           789
482      Zsim6797  0.8           789
483     Zsim10336  0.8           781
484     Zsim11362  0.8           789
485     Zsim11102  0.8           789
486     Zsim11220  0.8           789
487     Zmon20899  0.8           784
488     Zmon28375  0.8           780
489      Zev13949  0.8           789
490      Zev31650  0.8           781
491      Zev28451  0.8           789
492   Z_HI_BRY431  0.8           787
493   Z_HI_NAN290  0.8           789
494   Z_HI_WAI087  0.8           757
495   Z_HI_WAI078  0.8           781
496   Z_HI_SOL783  0.8           789
497       ZJlo012  0.9           789
498       ZJlo015  0.9           789
499       ZJlo016  0.9           789
500       ZJlo017  0.9           789
501       ZJlo021  0.9           789
502       ZJlo022  0.9           789
503       ZJlo024  0.9           789
504       ZJlo031  0.9           789
505       ZJlo045  0.9           782
506       ZJlo046  0.9           789
507       ZJlo050  0.9           788
508       ZJlo052  0.9           789
509       ZJlo055  0.9           789
510       ZJja001  0.9           771
511       ZJja002  0.9           789
512       ZJja003  0.9           789
513       ZJja004  0.9           760
514       ZJja005  0.9           789
515       ZJja009  0.9           789
516       ZJja010  0.9           778
517       ZJja012  0.9           789
518       ZJja013  0.9           789
519       ZJja014  0.9           789
520       ZJja016  0.9           789
521       ZJja030  0.9           789
522       ZJal003  0.9           789
523       Zjal004  0.9           789
524       ZJal005  0.9           789
525       ZJal010  0.9           789
526       ZJal011  0.9           789
527       ZJal012  0.9           789
528       ZJal020  0.9           789
529       ZJst007  0.9           789
530       ZJst009  0.9           789
531       ZJst010  0.9           789
532       ZJst012  0.9           789
533       ZJst014  0.9           789
534       ZJst015  0.9           789
535       ZJin001  0.9           789
536       ZJin003  0.9           789
537       ZJsi002  0.9           789
538       ZJsi017  0.9           781
539       ZJsi032  0.9           747
540       ZJsi023  0.9           789
541       ZJsi024  0.9           768
542       ZJsi027  0.9           758
543       ZJsi028  0.9           787
544       ZJsi029  0.9           789
545       ZJsi031  0.9           737
546       ZPxx001  0.9           789
547       ZPxx002  0.9           789
548       ZMxx002  0.9           789
549       ZMxx003  0.9           789
550       ZMxx004  0.9           789
551       ZMxx005  0.9           766
552       ZMxx006  0.9           789
553      ZMOxx001  0.9           789
554      ZMOxx002  0.9           789
555      ZMOxx003  0.9           782
556      ZMOxx005  0.9           789
557      ZMOha001  0.9           742
558      ZMOwh002  0.9           770
559      ZMOwh003  0.9           789
560      ZMOwh004  0.9           789
561      ZMOwh005  0.9           770
562      ZMOwh006  0.9           789
563      ZMOvu002  0.9           789
564      ZMOvu003  0.9           789
565      ZMOvu004  0.9           787
566      ZMOvu005  0.9           788
567      ZMOmo001  0.9           753
568      ZMOpa001  0.9           788
569      ZMOpa002  0.9           789
570       ZNni001  0.9           783
571      ZERxx002  0.9           789
572      ZERxx004  0.9           789
573      ZERxx005  0.9           789
574     Zsim23588  0.9           789
575     Zsim28142  0.9           789
576     Zsim30897  0.9           787
577     Zpal23498  0.9           782
578     Zpal23522  0.9           789
579     Zmon20893  0.9           789
580     Zmon20892  0.9           781
581     Zmon20902  0.9           789
582     Zmon20909  0.9           789
583     Zsim31166  0.9           784
584     Zsim31159  0.9           789
585     Zmey17876  0.9           789
586     Zmey17877  0.9           789
587     Zmey17852  0.9           788
588     Zmey17922  0.9           789
589     Zmey17920  0.9           789
590     Zmey17925  0.9           789
591     Zmey17923  0.9           789
592     Zery28090  0.9           789
593     Zery28091  0.9           781
594     Zery28088  0.9           731
595      Zni10863  0.9           784
596      Zni14341  0.9           789
597      Zni19650  0.9           788
598      Zni17984  0.9           784
599  ZjaDOT-10981  0.9           789
600   ZjaDOT-5235  0.9           789
601 Z_LA_122866_2  0.9           789
602 Z_LA_122577_2  0.9           789
603 Z_LA_122188_2  0.9           789
604     Zsim13809  0.9           758
605     Zsim13773  0.9           789
606      Zsim6797  0.9           789
607     Zsim10336  0.9           781
608     Zsim11362  0.9           789
609     Zsim11102  0.9           789
610     Zsim11220  0.9           789
611     Zmon20899  0.9           784
612     Zmon28375  0.9           780
613      Zev13949  0.9           789
614      Zev31650  0.9           781
615      Zev28451  0.9           789
616   Z_HI_BRY431  0.9           787
617   Z_HI_NAN290  0.9           789
618   Z_HI_WAI087  0.9           757
619   Z_HI_WAI078  0.9           781
620   Z_HI_SOL783  0.9           789
621       ZJlo012  1.0           203
622       ZJlo015  1.0           203
623       ZJlo016  1.0           203
624       ZJlo017  1.0           203
625       ZJlo021  1.0           203
626       ZJlo022  1.0           203
627       ZJlo024  1.0           203
628       ZJlo031  1.0           203
629       ZJlo045  1.0           203
630       ZJlo046  1.0           203
631       ZJlo050  1.0           203
632       ZJlo052  1.0           203
633       ZJlo055  1.0           203
634       ZJja001  1.0           203
635       ZJja002  1.0           203
636       ZJja003  1.0           203
637       ZJja004  1.0           203
638       ZJja005  1.0           203
639       ZJja009  1.0           203
640       ZJja010  1.0           203
641       ZJja012  1.0           203
642       ZJja013  1.0           203
643       ZJja014  1.0           203
644       ZJja016  1.0           203
645       ZJja030  1.0           203
646       ZJal003  1.0           203
647       Zjal004  1.0           203
648       ZJal005  1.0           203
649       ZJal010  1.0           203
650       ZJal011  1.0           203
651       ZJal012  1.0           203
652       ZJal020  1.0           203
653       ZJst007  1.0           203
654       ZJst009  1.0           203
655       ZJst010  1.0           203
656       ZJst012  1.0           203
657       ZJst014  1.0           203
658       ZJst015  1.0           203
659       ZJin001  1.0           203
660       ZJin003  1.0           203
661       ZJsi002  1.0           203
662       ZJsi017  1.0           203
663       ZJsi032  1.0           203
664       ZJsi023  1.0           203
665       ZJsi024  1.0           203
666       ZJsi027  1.0           203
667       ZJsi028  1.0           203
668       ZJsi029  1.0           203
669       ZJsi031  1.0           203
670       ZPxx001  1.0           203
671       ZPxx002  1.0           203
672       ZMxx002  1.0           203
673       ZMxx003  1.0           203
674       ZMxx004  1.0           203
675       ZMxx005  1.0           203
676       ZMxx006  1.0           203
677      ZMOxx001  1.0           203
678      ZMOxx002  1.0           203
679      ZMOxx003  1.0           203
680      ZMOxx005  1.0           203
681      ZMOha001  1.0           203
682      ZMOwh002  1.0           203
683      ZMOwh003  1.0           203
684      ZMOwh004  1.0           203
685      ZMOwh005  1.0           203
686      ZMOwh006  1.0           203
687      ZMOvu002  1.0           203
688      ZMOvu003  1.0           203
689      ZMOvu004  1.0           203
690      ZMOvu005  1.0           203
691      ZMOmo001  1.0           203
692      ZMOpa001  1.0           203
693      ZMOpa002  1.0           203
694       ZNni001  1.0           203
695      ZERxx002  1.0           203
696      ZERxx004  1.0           203
697      ZERxx005  1.0           203
698     Zsim23588  1.0           203
699     Zsim28142  1.0           203
700     Zsim30897  1.0           203
701     Zpal23498  1.0           203
702     Zpal23522  1.0           203
703     Zmon20893  1.0           203
704     Zmon20892  1.0           203
705     Zmon20902  1.0           203
706     Zmon20909  1.0           203
707     Zsim31166  1.0           203
708     Zsim31159  1.0           203
709     Zmey17876  1.0           203
710     Zmey17877  1.0           203
711     Zmey17852  1.0           203
712     Zmey17922  1.0           203
713     Zmey17920  1.0           203
714     Zmey17925  1.0           203
715     Zmey17923  1.0           203
716     Zery28090  1.0           203
717     Zery28091  1.0           203
718     Zery28088  1.0           203
719      Zni10863  1.0           203
720      Zni14341  1.0           203
721      Zni19650  1.0           203
722      Zni17984  1.0           203
723  ZjaDOT-10981  1.0           203
724   ZjaDOT-5235  1.0           203
725 Z_LA_122866_2  1.0           203
726 Z_LA_122577_2  1.0           203
727 Z_LA_122188_2  1.0           203
728     Zsim13809  1.0           203
729     Zsim13773  1.0           203
730      Zsim6797  1.0           203
731     Zsim10336  1.0           203
732     Zsim11362  1.0           203
733     Zsim11102  1.0           203
734     Zsim11220  1.0           203
735     Zmon20899  1.0           203
736     Zmon28375  1.0           203
737      Zev13949  1.0           203
738      Zev31650  1.0           203
739      Zev28451  1.0           203
740   Z_HI_BRY431  1.0           203
741   Z_HI_NAN290  1.0           203
742   Z_HI_WAI087  1.0           203
743   Z_HI_WAI078  1.0           203
744   Z_HI_SOL783  1.0           203
#convert to genlight
gen<-vcfR2genlight(test)
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/cali.zosterops.rad/test.99.splits.txt")

99% complete splitstree

99% completeness threshold shows that samples ‘Zpxx002’ and ‘ZJsi027’ are really intermediate, not just weakly assigned due to excess missing data.

Try setting the threshold at 90% (should retain around 15K high quality SNPs)

vcfR.trimmed<-missing_by_snp(vcfR=vcfR.trim, cutoff = .9)
cutoff is specified, filtered vcfR object will be returned
90.92% of SNPs fell below a completeness cutoff of 0.9 and were removed from the VCF

miss<-missing_by_sample(vcfR.trimmed)
No popmap provided

vcfR.trimmed
***** Object of Class vcfR *****
124 samples
239 CHROMs
15,722 variants
Object size: 166.6 Mb
5.316 percent missing data
*****        *****         *****
#convert to genlight
gen<-vcfR2genlight(vcfR.trimmed)
#assign populations (a StaMPP requirement)
gen@pop<-as.factor(gen@ind.names)
#generate pairwise divergence matrix
sample.div <- stamppNeisD(gen, pop = FALSE)
#export for splitstree
stamppPhylip(distance.mat=sample.div, file="~/Desktop/cali.zosterops.rad/filtered.90.splits.txt")

Now with a greater number of SNPs (15K vs 1K) we can parse fine-scale geographic structure in Z. japonicus. Our intermediate samples show up in the same places and the relationships between clades stay the same, indicating that we can safely set this 90% completeness threshold, allowing ~5% missing genotypes into the matrix, without biasing our estimates of relatedness between samples.

6 Remove overlapping SNPs

It is a known thing (see this) that Stacks will not merge SNPs called twice if they are sequenced by separate (but physically overlapping) loci. To account for this, we will simply remove a SNP every time its chromosome and position have already been seen in the dataset with the following code:

#generate dataframe containing information for chromosome and bp locality of each SNP
df<-as.data.frame(vcfR.trimmed@fix[,1:2])
#calc number of duplicated SNPs to remove
nrow(df) - length(unique(paste(df$CHROM,df$POS)))
[1] 18
#remove duplicated SNPs
vcfR.trimmed<-vcfR.trimmed[!duplicated(paste(df$CHROM,df$POS)),]

7 Visualize depth and quality across all retained genotypes

#plot depth per snp and per sample
dp <- extract.gt(vcfR.trimmed, element = "DP", as.numeric=TRUE)
heatmap.bp(dp, rlabels = FALSE)

#plot genotype quality per snp and per sample
gq <- extract.gt(vcfR.trimmed, element = "GQ", as.numeric=TRUE)
heatmap.bp(gq, rlabels = FALSE)

8 linkage filter

Now filter for linkage

#perform linkage filtering to get a reduced vcf with only unlinked SNPs
vcfR.thin<-distance_thin(vcfR.trimmed, min.distance = 1000)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==============================                                        |  44%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
1554 out of 15704 input SNPs were not located within 1000 base-pairs of another SNP and were retained despite filtering

9 write vcf to disk for downstream analyses

#get info for all SNPs passing filtering vcf dataset
vcfR.trimmed
***** Object of Class vcfR *****
124 samples
239 CHROMs
15,704 variants
Object size: 166.5 Mb
5.314 percent missing data
*****        *****         *****
#write to disk
#vcfR::write.vcf(vcfR.trimmed, file = "~/Desktop/cali.zosterops.rad/filtered.snps.vcf.gz")

#get info for the thinned SNP dataset
vcfR.thin
***** Object of Class vcfR *****
124 samples
239 CHROMs
1,554 variants
Object size: 16.3 Mb
5.522 percent missing data
*****        *****         *****
#write to disk
#vcfR::write.vcf(vcfR.thin, file = "~/Desktop/cali.zosterops.rad/filtered.unlinked.snps.vcf.gz")

References

DeRaad, Devon A. 2022. “Snpfiltr: An R Package for Interactive and Reproducible SNP Filtering.” Molecular Ecology Resources 22 (6): 2443–53. https://doi.org/10.1111/1755-0998.13618.
Li, Heng, and Richard Durbin. 2009. “Fast and accurate short read alignment with Burrows-Wheeler transform.” Bioinformatics (Oxford, England) 25 (14): 1754–60. https://doi.org/10.1093/bioinformatics/btp324.
Venkatraman, Madhvi, Robert C. Fleischer, and Mirian T. N. Tsuchiya. 2021. “Comparative Analysis of Annotation Pipelines Using the First Japanese White-Eye (Zosterops japonicus) Genome.” Genome Biology and Evolution 13 (5): evab063. https://doi.org/10.1093/gbe/evab063.